NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

What Do NLP Researchers Believe? Results of the NLP Community Metasurvey

https://doi.org/10.18653/v1/2023.acl-long.903

Michael, Julian; Holtzman, Ari; Parrish, Alicia; Mueller, Aaron; Wang, Alex; Chen, Angelica; Madaan, Divyam; Nangia, Nikita; Pang, Richard Yuanzhe; Phang, Jason; et al (January 2023, Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers))

Full Text Available
What Makes Reading Comprehension Questions Difficult?

https://doi.org/10.18653/v1/2022.acl-long.479

Sugawara, Saku; Nangia, Nikita; Warstadt, Alex; Bowman, Samuel (January 2022, Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics Volume 1: Long Papers)

Full Text Available
Single-Turn Debate Does Not Help Humans Answer Hard Reading-Comprehension Questions

https://doi.org/10.18653/v1/2022.lnls-1.3

Parrish, Alicia; Trivedi, Harsh; Perez, Ethan; Chen, Angelica; Nangia, Nikita; Phang, Jason; Bowman, Samuel (January 2022, Proceedings of the First Workshop on Learning with Natural Language Supervision)

Full Text Available
BBQ: A hand-built bias benchmark for question answering

https://doi.org/10.18653/v1/2022.findings-acl.165

Parrish, Alicia; Chen, Angelica; Nangia, Nikita; Padmakumar, Vishakh; Phang, Jason; Thompson, Jana; Htut, Phu Mon; Bowman, Samuel (January 2022, Findings of the Association for Computational Linguistics: ACL 2022)

Full Text Available
QuALITY: Question Answering with Long Input Texts, Yes!

Bowman, Samuel R.; Chen, Angelica; He, He; Joshi, Nitish; Ma, Johnny; Nangia, Nikita; Padmakumar, Vishakh; Pang, Richard Yuanzhe; Parrish, Alicia; Phang, Jason; et al (May 2022, NAACL 2022)

To enable building and testing models on long-document comprehension, we introduce QuALITY, a multiple-choice QA dataset with context passages in English that have an average length of about 5,000 tokens, much longer than typical current models can process. Unlike in prior work with passages, our questions are written and validated by contributors who have read the entire passage, rather than relying on summaries or excerpts. In addition, only half of the questions are answerable by annotators working under tight time constraints, indicating that skimming and simple search are not enough to consistently perform well. Our baseline models perform poorly on this task (55.4%) and significantly lag behind human performance (93.5%).
more » « less
Full Text Available
QuALITY: Question Answering with Long Input Texts, Yes!

https://doi.org/10.18653/v1/2022.naacl-main.391

Pang, Richard Yuanzhe; Parrish, Alicia; Joshi, Nitish; Nangia, Nikita; Phang, Jason; Chen, Angelica; Padmakumar, Vishakh; Ma, Johnny; Thompson, Jana; He, He; et al (January 2022, Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies)

Full Text Available
What Ingredients Make for an Effective Crowdsourcing Protocol for Difficult NLU Data Collection Tasks?

Nangia, Nikita; Sugawara, Saku; Trivedi, Harsh; Warstadt, Alex; Vania, Clara; Bowman, Samuel R. (January 2021, Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics)
null (Ed.)
Crowdsourcing is widely used to create data for common natural language understanding tasks. Despite the importance of these datasets for measuring and refining model understanding of language, there has been little focus on the crowdsourcing methods used for collecting the datasets. In this paper, we compare the efficacy of interventions that have been proposed in prior work as ways of improving data quality. We use multiple-choice question answering as a testbed and run a randomized trial by assigning crowdworkers to write questions under one of four different data collection protocols. We find that asking workers to write explanations for their examples is an ineffective stand-alone strategy for boosting NLU example difficulty. However, we find that training crowdworkers, and then using an iterative process of collecting data, sending feedback, and qualifying workers based on expert judgments is an effective means of collecting challenging data. But using crowdsourced, instead of expert judgments, to qualify workers and send feedback does not prove to be effective. We observe that the data from the iterative protocol with expert assessments is more challenging by several measures. Notably, the human--model gap on the unanimous agreement portion of this data is, on average, twice as large as the gap for the baseline protocol data.
more » « less
Full Text Available
Does Putting a Linguist in the Loop Improve NLU Data Collection?

https://doi.org/10.18653/v1/2021.findings-emnlp.421

Parrish, Alicia; Huang, William; Agha, Omar; Lee, Soo-Hwan; Nangia, Nikita; Warstadt, Alexia; Aggarwal, Karmanya; Allaway, Emily; Linzen, Tal; Bowman, Samuel R. (January 2021, Findings of the Association for Computational Linguistics: EMNLP 2021)

Full Text Available
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Srivastava, Aarohi; Rastogi, Abhinav; Rao, Abhishek; Shoeb, Abu Awal; Abid, Abubakar; Fisch, Adam; Brown, Adam R.; Santoro, Adam; Gupta, Aditya; Garriga-Alonso, Adri; et al (January 2023, Transactions on machine learning research)

Full Text Available

Search for: All records